{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "81a6f12e",
   "metadata": {},
   "source": [
    "下面几段代码展示朴素贝叶斯模型的训练和预测。这里使用的数据集为本书自制的Books数据集，包含约1万本图书的标题，分为3种主题。首先是预处理，针对文本分类的预处理主要包含以下步骤：\n",
    "\n",
    "- 通常可以将英文文本全部转换为小写，或者将中文内容全部转换为简体，等等，这一般不会改变文本内容。\n",
    "- 去除标点。英文中的标点符号和单词之间没有空格（如——“Hi, there!”），如果不去除标点，“Hi,”和“there!”会被识别为不同于“Hi”和“there”的两个词，这显然是不合理的。对于中文，移除标点一般也不会影响文本的内容。\n",
    "- 分词。中文汉字之间没有空格分隔，中文分词有时比英文分词更加困难，此处不再赘述。\n",
    "- 去除停用词（如“I”、“is”、“的”等）。这些词往往大量出现但没有具体含义。\n",
    "- 建立词表。通常会忽略语料库中频率非常低的词。\n",
    "- 将词转换为词表索引（ID），便于机器学习模型使用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "4cd78a10",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple\n",
      "Requirement already satisfied: requests in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (2.32.3)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests) (3.4.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests) (3.10)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests) (2.4.0)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests) (2025.4.26)\n",
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "pip install requests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "38df35ad",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple\n",
      "Requirement already satisfied: spacy in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (3.8.5)\n",
      "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (3.0.12)\n",
      "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (1.0.5)\n",
      "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (1.0.12)\n",
      "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.0.11)\n",
      "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (3.0.9)\n",
      "Requirement already satisfied: thinc<8.4.0,>=8.3.4 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (8.3.6)\n",
      "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (1.1.3)\n",
      "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.5.1)\n",
      "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.0.10)\n",
      "Requirement already satisfied: weasel<0.5.0,>=0.1.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (0.4.1)\n",
      "Requirement already satisfied: typer<1.0.0,>=0.3.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (0.15.3)\n",
      "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (4.67.1)\n",
      "Requirement already satisfied: numpy>=1.19.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.2.5)\n",
      "Requirement already satisfied: requests<3.0.0,>=2.13.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.32.3)\n",
      "Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.11.4)\n",
      "Requirement already satisfied: jinja2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (3.1.6)\n",
      "Requirement already satisfied: setuptools in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (80.3.1)\n",
      "Requirement already satisfied: packaging>=20.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (25.0)\n",
      "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (3.5.0)\n",
      "Requirement already satisfied: language-data>=1.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from langcodes<4.0.0,>=3.2.0->spacy) (1.3.0)\n",
      "Requirement already satisfied: annotated-types>=0.6.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.7.0)\n",
      "Requirement already satisfied: pydantic-core==2.33.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (2.33.2)\n",
      "Requirement already satisfied: typing-extensions>=4.12.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (4.13.2)\n",
      "Requirement already satisfied: typing-inspection>=0.4.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.4.0)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.4.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.10)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.4.0)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2025.4.26)\n",
      "Requirement already satisfied: blis<1.4.0,>=1.3.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from thinc<8.4.0,>=8.3.4->spacy) (1.3.0)\n",
      "Requirement already satisfied: confection<1.0.0,>=0.0.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from thinc<8.4.0,>=8.3.4->spacy) (0.1.5)\n",
      "Requirement already satisfied: colorama in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from tqdm<5.0.0,>=4.38.0->spacy) (0.4.6)\n",
      "Requirement already satisfied: click>=8.0.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from typer<1.0.0,>=0.3.0->spacy) (8.1.8)\n",
      "Requirement already satisfied: shellingham>=1.3.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from typer<1.0.0,>=0.3.0->spacy) (1.5.4)\n",
      "Requirement already satisfied: rich>=10.11.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from typer<1.0.0,>=0.3.0->spacy) (14.0.0)\n",
      "Requirement already satisfied: cloudpathlib<1.0.0,>=0.7.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from weasel<0.5.0,>=0.1.0->spacy) (0.21.0)\n",
      "Requirement already satisfied: smart-open<8.0.0,>=5.2.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from weasel<0.5.0,>=0.1.0->spacy) (7.1.0)\n",
      "Requirement already satisfied: wrapt in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from smart-open<8.0.0,>=5.2.1->weasel<0.5.0,>=0.1.0->spacy) (1.17.2)\n",
      "Requirement already satisfied: marisa-trie>=1.1.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spacy) (1.2.1)\n",
      "Requirement already satisfied: markdown-it-py>=2.2.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (3.0.0)\n",
      "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (2.19.1)\n",
      "Requirement already satisfied: mdurl~=0.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (0.1.2)\n",
      "Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from jinja2->spacy) (3.0.2)\n",
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "pip install spacy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "3eb2583e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple\n",
      "Requirement already satisfied: spacy in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (3.8.5)\n",
      "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (3.0.12)\n",
      "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (1.0.5)\n",
      "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (1.0.12)\n",
      "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.0.11)\n",
      "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (3.0.9)\n",
      "Requirement already satisfied: thinc<8.4.0,>=8.3.4 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (8.3.6)\n",
      "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (1.1.3)\n",
      "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.5.1)\n",
      "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.0.10)\n",
      "Requirement already satisfied: weasel<0.5.0,>=0.1.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (0.4.1)\n",
      "Requirement already satisfied: typer<1.0.0,>=0.3.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (0.15.3)\n",
      "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (4.67.1)\n",
      "Requirement already satisfied: numpy>=1.19.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.2.5)\n",
      "Requirement already satisfied: requests<3.0.0,>=2.13.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.32.3)\n",
      "Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.11.4)\n",
      "Requirement already satisfied: jinja2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (3.1.6)\n",
      "Requirement already satisfied: setuptools in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (80.3.1)\n",
      "Requirement already satisfied: packaging>=20.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (25.0)\n",
      "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (3.5.0)\n",
      "Requirement already satisfied: language-data>=1.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from langcodes<4.0.0,>=3.2.0->spacy) (1.3.0)\n",
      "Requirement already satisfied: annotated-types>=0.6.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.7.0)\n",
      "Requirement already satisfied: pydantic-core==2.33.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (2.33.2)\n",
      "Requirement already satisfied: typing-extensions>=4.12.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (4.13.2)\n",
      "Requirement already satisfied: typing-inspection>=0.4.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.4.0)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.4.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.10)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.4.0)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2025.4.26)\n",
      "Requirement already satisfied: blis<1.4.0,>=1.3.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from thinc<8.4.0,>=8.3.4->spacy) (1.3.0)\n",
      "Requirement already satisfied: confection<1.0.0,>=0.0.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from thinc<8.4.0,>=8.3.4->spacy) (0.1.5)\n",
      "Requirement already satisfied: colorama in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from tqdm<5.0.0,>=4.38.0->spacy) (0.4.6)\n",
      "Requirement already satisfied: click>=8.0.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from typer<1.0.0,>=0.3.0->spacy) (8.1.8)\n",
      "Requirement already satisfied: shellingham>=1.3.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from typer<1.0.0,>=0.3.0->spacy) (1.5.4)\n",
      "Requirement already satisfied: rich>=10.11.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from typer<1.0.0,>=0.3.0->spacy) (14.0.0)\n",
      "Requirement already satisfied: cloudpathlib<1.0.0,>=0.7.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from weasel<0.5.0,>=0.1.0->spacy) (0.21.0)\n",
      "Requirement already satisfied: smart-open<8.0.0,>=5.2.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from weasel<0.5.0,>=0.1.0->spacy) (7.1.0)\n",
      "Requirement already satisfied: wrapt in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from smart-open<8.0.0,>=5.2.1->weasel<0.5.0,>=0.1.0->spacy) (1.17.2)\n",
      "Requirement already satisfied: marisa-trie>=1.1.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spacy) (1.2.1)\n",
      "Requirement already satisfied: markdown-it-py>=2.2.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (3.0.0)\n",
      "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (2.19.1)\n",
      "Requirement already satisfied: mdurl~=0.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (0.1.2)\n",
      "Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from jinja2->spacy) (3.0.2)\n",
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "pip install -U spacy -i https://pypi.tuna.tsinghua.edu.cn/simple"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "92a0323b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple\n",
      "Collecting zh-core-web-sm==3.7.0\n",
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  ERROR: HTTP error 404 while getting https://mirror.tuna.tsinghua.edu.cn/github-release/explosion/spacy-models/zh_core_web_sm-3.7.0/zh_core_web_sm-3.7.0-py3-none-any.whl\n",
      "ERROR: Could not install requirement zh-core-web-sm==3.7.0 from https://mirror.tuna.tsinghua.edu.cn/github-release/explosion/spacy-models/zh_core_web_sm-3.7.0/zh_core_web_sm-3.7.0-py3-none-any.whl because of HTTP error 404 Client Error: Not Found for url: https://mirror.tuna.tsinghua.edu.cn/github-release/explosion/spacy-models/zh_core_web_sm-3.7.0/zh_core_web_sm-3.7.0-py3-none-any.whl for URL https://mirror.tuna.tsinghua.edu.cn/github-release/explosion/spacy-models/zh_core_web_sm-3.7.0/zh_core_web_sm-3.7.0-py3-none-any.whl\n"
     ]
    }
   ],
   "source": [
    "pip install https://mirror.tuna.tsinghua.edu.cn/github-release/explosion/spacy-models/zh_core_web_sm-3.7.0/zh_core_web_sm-3.7.0-py3-none-any.whl"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "7a390f1f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple\n",
      "Requirement already satisfied: spacy in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (3.8.5)\n",
      "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (3.0.12)\n",
      "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (1.0.5)\n",
      "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (1.0.12)\n",
      "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.0.11)\n",
      "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (3.0.9)\n",
      "Requirement already satisfied: thinc<8.4.0,>=8.3.4 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (8.3.6)\n",
      "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (1.1.3)\n",
      "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.5.1)\n",
      "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.0.10)\n",
      "Requirement already satisfied: weasel<0.5.0,>=0.1.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (0.4.1)\n",
      "Requirement already satisfied: typer<1.0.0,>=0.3.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (0.15.3)\n",
      "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (4.67.1)\n",
      "Requirement already satisfied: numpy>=1.19.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.2.5)\n",
      "Requirement already satisfied: requests<3.0.0,>=2.13.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.32.3)\n",
      "Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (2.11.4)\n",
      "Requirement already satisfied: jinja2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (3.1.6)\n",
      "Requirement already satisfied: setuptools in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (80.3.1)\n",
      "Requirement already satisfied: packaging>=20.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (25.0)\n",
      "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from spacy) (3.5.0)\n",
      "Requirement already satisfied: language-data>=1.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from langcodes<4.0.0,>=3.2.0->spacy) (1.3.0)\n",
      "Requirement already satisfied: annotated-types>=0.6.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.7.0)\n",
      "Requirement already satisfied: pydantic-core==2.33.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (2.33.2)\n",
      "Requirement already satisfied: typing-extensions>=4.12.2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (4.13.2)\n",
      "Requirement already satisfied: typing-inspection>=0.4.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.4.0)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.4.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.10)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.4.0)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2025.4.26)\n",
      "Requirement already satisfied: blis<1.4.0,>=1.3.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from thinc<8.4.0,>=8.3.4->spacy) (1.3.0)\n",
      "Requirement already satisfied: confection<1.0.0,>=0.0.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from thinc<8.4.0,>=8.3.4->spacy) (0.1.5)\n",
      "Requirement already satisfied: colorama in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from tqdm<5.0.0,>=4.38.0->spacy) (0.4.6)\n",
      "Requirement already satisfied: click>=8.0.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from typer<1.0.0,>=0.3.0->spacy) (8.1.8)\n",
      "Requirement already satisfied: shellingham>=1.3.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from typer<1.0.0,>=0.3.0->spacy) (1.5.4)\n",
      "Requirement already satisfied: rich>=10.11.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from typer<1.0.0,>=0.3.0->spacy) (14.0.0)\n",
      "Requirement already satisfied: cloudpathlib<1.0.0,>=0.7.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from weasel<0.5.0,>=0.1.0->spacy) (0.21.0)\n",
      "Requirement already satisfied: smart-open<8.0.0,>=5.2.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from weasel<0.5.0,>=0.1.0->spacy) (7.1.0)\n",
      "Requirement already satisfied: wrapt in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from smart-open<8.0.0,>=5.2.1->weasel<0.5.0,>=0.1.0->spacy) (1.17.2)\n",
      "Requirement already satisfied: marisa-trie>=1.1.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spacy) (1.2.1)\n",
      "Requirement already satisfied: markdown-it-py>=2.2.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (3.0.0)\n",
      "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (2.19.1)\n",
      "Requirement already satisfied: mdurl~=0.1 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy) (0.1.2)\n",
      "Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\孔子\\desktop\\社会网络舆情\\@hands-on-nlp-main\\@hands-on-nlp-main\\.venv\\lib\\site-packages (from jinja2->spacy) (3.0.2)\n",
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "pip install spacy -i https://pypi.tuna.tsinghua.edu.cn/simple"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "5936ceb0",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "train size = 8627 , test size = 2157\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 8627/8627 [00:44<00:00, 195.13it/s]\n",
      "100%|██████████| 2157/2157 [00:10<00:00, 201.69it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['python', '编程', '入门', '教程']\n",
      "{'计算机类': 0, '艺术传媒类': 1, '经管类': 2}\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "import os\n",
    "import requests\n",
    "import re\n",
    "from tqdm import tqdm\n",
    "from collections import defaultdict\n",
    "from string import punctuation\n",
    "import spacy\n",
    "from spacy.lang.zh.stop_words import STOP_WORDS\n",
    "nlp = spacy.load('zh_core_web_sm')\n",
    "\n",
    "\n",
    "class BooksDataset:\n",
    "    def __init__(self):\n",
    "        train_file, test_file = 'train.jsonl', 'test.jsonl'\n",
    "\n",
    "        # 下载数据为JSON格式，转化为Python对象\n",
    "        def read_file(file_name):\n",
    "            with open(file_name, 'r', encoding='utf-8') as fin:\n",
    "                json_list = list(fin)\n",
    "            data_split = []\n",
    "            for json_str in json_list:\n",
    "                data_split.append(json.loads(json_str))\n",
    "            return data_split\n",
    "\n",
    "        self.train_data, self.test_data = read_file(train_file),\\\n",
    "            read_file(test_file)\n",
    "        print('train size =', len(self.train_data), \n",
    "              ', test size =', len(self.test_data))\n",
    "        \n",
    "        # 建立文本标签和数字标签的映射\n",
    "        self.label2id, self.id2label = {}, {}\n",
    "        for data_split in [self.train_data, self.test_data]:\n",
    "            for data in data_split:\n",
    "                txt = data['class']\n",
    "                if txt not in self.label2id:\n",
    "                    idx = len(self.label2id)\n",
    "                    self.label2id[txt] = idx\n",
    "                    self.id2label[idx] = txt\n",
    "                label_id = self.label2id[txt]\n",
    "                data['label'] = label_id\n",
    "\n",
    "    def tokenize(self, attr='book'):\n",
    "        # 使用以下两行命令安装spacy用于中文分词\n",
    "        # pip install -U spacy\n",
    "        # python -m spacy download zh_core_web_sm\n",
    "        # 去除文本中的符号和停用词\n",
    "        for data_split in [self.train_data, self.test_data]:\n",
    "            for data in tqdm(data_split):\n",
    "                # 转为小写\n",
    "                text = data[attr].lower()\n",
    "                # 符号替换为空\n",
    "                tokens = [t.text for t in nlp(text) \\\n",
    "                    if t.text not in STOP_WORDS]\n",
    "                # 这一步比较耗时，因此把tokenize的结果储存起来\n",
    "                data['tokens'] = tokens\n",
    "\n",
    "    # 根据分词结果建立词表，忽略部分低频词，\n",
    "    # 可以设置词最短长度和词表最大大小\n",
    "    def build_vocab(self, min_freq=3, min_len=2, max_size=None):\n",
    "        frequency = defaultdict(int)\n",
    "        for data in self.train_data:\n",
    "            tokens = data['tokens']\n",
    "            for token in tokens:\n",
    "                frequency[token] += 1 \n",
    "\n",
    "        print(f'unique tokens = {len(frequency)}, '+\\\n",
    "              f'total counts = {sum(frequency.values())}, '+\\\n",
    "              f'max freq = {max(frequency.values())}, '+\\\n",
    "              f'min freq = {min(frequency.values())}')    \n",
    "\n",
    "        self.token2id = {}\n",
    "        self.id2token = {}\n",
    "        total_count = 0\n",
    "        for token, freq in sorted(frequency.items(),\\\n",
    "            key=lambda x: -x[1]):\n",
    "            if max_size and len(self.token2id) >= max_size:\n",
    "                break\n",
    "            if freq > min_freq:\n",
    "                if (min_len is None) or (min_len and \\\n",
    "                    len(token) >= min_len):\n",
    "                    self.token2id[token] = len(self.token2id)\n",
    "                    self.id2token[len(self.id2token)] = token\n",
    "                    total_count += freq\n",
    "            else:\n",
    "                break\n",
    "        print(f'min_freq = {min_freq}, min_len = {min_len}, '+\\\n",
    "              f'max_size = {max_size}, '\n",
    "              f'remaining tokens = {len(self.token2id)}, '\n",
    "              f'in-vocab rate = {total_count / sum(frequency.values())}')\n",
    "\n",
    "    # 将分词后的结果转化为数字索引\n",
    "    def convert_tokens_to_ids(self):\n",
    "        for data_split in [self.train_data, self.test_data]:\n",
    "            for data in data_split:\n",
    "                data['token_ids'] = []\n",
    "                for token in data['tokens']:\n",
    "                    if token in self.token2id:\n",
    "                        data['token_ids'].append(self.token2id[token])\n",
    "\n",
    "        \n",
    "dataset = BooksDataset()\n",
    "dataset.tokenize()\n",
    "print(dataset.train_data[0]['tokens'])\n",
    "print(dataset.label2id)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26d79e05",
   "metadata": {},
   "source": [
    "完成分词后，对出现次数超过3次的词元建立词表，并将分词后的文档转化为词元id的序列。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0d6b1918",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "unique tokens = 6956, total counts = 54884, max freq = 1635, min freq = 1\n",
      "min_freq = 3, min_len = 2, max_size = None, remaining tokens = 1650, in-vocab rate = 0.7944209605713869\n",
      "[18, 26, 5, 0]\n"
     ]
    }
   ],
   "source": [
    "dataset.build_vocab(min_freq=3)\n",
    "dataset.convert_tokens_to_ids()\n",
    "print(dataset.train_data[0]['token_ids'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d096d95f",
   "metadata": {},
   "source": [
    "接下来将数据和标签准备成便于训练的矩阵格式。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "ba632265",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "train_X, train_Y = [], []\n",
    "test_X, test_Y = [], []\n",
    "\n",
    "for data in dataset.train_data:\n",
    "    x = np.zeros(len(dataset.token2id), dtype=np.int32)\n",
    "    for token_id in data['token_ids']:\n",
    "        x[token_id] += 1\n",
    "    train_X.append(x)\n",
    "    train_Y.append(data['label'])\n",
    "for data in dataset.test_data:\n",
    "    x = np.zeros(len(dataset.token2id), dtype=np.int32)\n",
    "    for token_id in data['token_ids']:\n",
    "        x[token_id] += 1\n",
    "    test_X.append(x)\n",
    "    test_Y.append(data['label'])\n",
    "train_X, train_Y = np.array(train_X), np.array(train_Y)\n",
    "test_X, test_Y = np.array(test_X), np.array(test_Y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3938acdb",
   "metadata": {},
   "source": [
    "下面代码展示朴素贝叶斯的训练和预测。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "f13251b7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "P(计算机类) = 0.4453460067230787\n",
      "P(艺术传媒类) = 0.26660484525327466\n",
      "P(经管类) = 0.2880491480236467\n",
      "P(教程|计算机类) = 0.5726495726495726\n",
      "P(基础|计算机类) = 0.6503006012024048\n",
      "P(设计|计算机类) = 0.606694560669456\n",
      "test example-0, prediction = 0, label = 0\n",
      "test example-1, prediction = 0, label = 0\n",
      "test example-2, prediction = 1, label = 1\n",
      "test example-3, prediction = 1, label = 1\n",
      "test example-4, prediction = 1, label = 1\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "class NaiveBayes:\n",
    "    def __init__(self, num_classes, vocab_size):\n",
    "        self.num_classes = num_classes\n",
    "        self.vocab_size = vocab_size\n",
    "        self.prior = np.zeros(num_classes, dtype=np.float64)\n",
    "        self.likelihood = np.zeros((num_classes, vocab_size),\\\n",
    "            dtype=np.float64)\n",
    "        \n",
    "    def fit(self, X, Y):\n",
    "        # NaiveBayes的训练主要涉及先验概率和似然的估计，\n",
    "        # 这两者都可以通过计数简单获得\n",
    "        for x, y in zip(X, Y):\n",
    "            self.prior[y] += 1\n",
    "            for token_id in x:\n",
    "                self.likelihood[y, token_id] += 1\n",
    "                \n",
    "        self.prior /= self.prior.sum()\n",
    "        # laplace平滑\n",
    "        self.likelihood += 1\n",
    "        self.likelihood /= self.likelihood.sum(axis=0)\n",
    "        # 为了避免精度溢出，使用对数概率\n",
    "        self.prior = np.log(self.prior)\n",
    "        self.likelihood = np.log(self.likelihood)\n",
    "    \n",
    "    def predict(self, X):\n",
    "        # 算出各个类别的先验概率与似然的乘积，找出最大的作为分类结果\n",
    "        preds = []\n",
    "        for x in X:\n",
    "            p = np.zeros(self.num_classes, dtype=np.float64)\n",
    "            for i in range(self.num_classes):\n",
    "                p[i] += self.prior[i]\n",
    "                for token in x:\n",
    "                    p[i] += self.likelihood[i, token]\n",
    "            preds.append(np.argmax(p))\n",
    "        return preds\n",
    "\n",
    "nb = NaiveBayes(len(dataset.label2id), len(dataset.token2id))\n",
    "train_X, train_Y = [], []\n",
    "for data in dataset.train_data:\n",
    "    train_X.append(data['token_ids'])\n",
    "    train_Y.append(data['label'])\n",
    "nb.fit(train_X, train_Y)\n",
    "\n",
    "for i in range(3):\n",
    "    print(f'P({dataset.id2label[i]}) = {np.exp(nb.prior[i])}')\n",
    "for i in range(3):\n",
    "    print(f'P({dataset.id2token[i]}|{dataset.id2label[0]}) = '+\\\n",
    "          f'{np.exp(nb.likelihood[0, i])}')\n",
    "\n",
    "test_X, test_Y = [], []\n",
    "for data in dataset.test_data:\n",
    "    test_X.append(data['token_ids'])\n",
    "    test_Y.append(data['label'])\n",
    "    \n",
    "NB_preds = nb.predict(test_X)\n",
    "    \n",
    "for i, (p, y) in enumerate(zip(NB_preds, test_Y)):\n",
    "    if i >= 5:\n",
    "        break\n",
    "    print(f'test example-{i}, prediction = {p}, label = {y}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1cf6399",
   "metadata": {},
   "source": [
    "下面使用第3章介绍的TF-IDF方法得到文档的特征向量，并使用PyTorch实现逻辑斯谛回归模型的训练和预测。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21a3bc79",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "\n",
    "sys.path.append('C:/Users/孔子/Desktop/社会网络舆情/@Hands-on-NLP-main/@Hands-on-NLP-main/code')\n",
    "from my_my_utils import TFIDF\n",
    "        \n",
    "tfidf = TFIDF(len(dataset.token2id))\n",
    "tfidf.fit(train_X)\n",
    "train_F = tfidf.transform(train_X)\n",
    "test_F = tfidf.transform(test_X)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc8af30b",
   "metadata": {},
   "source": [
    "逻辑斯谛回归可以看作一个一层的神经网络模型，使用PyTorch实现可以方便地利用自动求导功能。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ddebf0c",
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "epoch-49, loss=0.1800: 100%|█| 50/50 [00:07<00:00,  6.44it/s\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAj8AAAGwCAYAAABGogSnAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvc2/+5QAAAAlwSFlzAAAPYQAAD2EBqD+naQAAVytJREFUeJzt3Qd8jPcfB/Bv9iIhQiKE2Ftii61SMUq1+q9VSovSahWt0VqlRatVraoORTcdStUWmxBizyAIQURCtsz7v36/uMvzXG7vu+fz7ut6d88999zvnlxyH7/pJJPJZAQAAAAgEc7WLgAAAACAJSH8AAAAgKQg/AAAAICkIPwAAACApCD8AAAAgKQg/AAAAICkIPwAAACApLiSxBQXF9OdO3eofPny5OTkZO3iAAAAgA7YtISZmZkUHBxMzs7G1d1ILvyw4BMSEmLtYgAAAIABbt26RdWrVydjSC78sBof+cnz9fW1dnEAAABABxkZGbzyQv49bgzJhR95UxcLPgg/AAAA9sUUXVbQ4RkAAAAkBeEHAAAAJAXhBwAAACQF4QcAAAAkBeEHAAAAJAXhBwAAACQF4QcAAAAkBeEHAAAAJAXhBwAAACQF4QcAAAAkBeEHAAAAJAXhBwAAACQF4cdEZDIZPcjKo6v3s6xdFAAAANAA4cdE9ly+T60/3EVDvz9i7aIAAACABgg/JlIroBy/vp+ZR4mpOdYuDgAAAKiB8GMioZW8qaqfJ7+9+1KytYsDAAAAaiD8mIiTkxO1q+XPb8/ddMHaxQEAAAA1EH5MKK+wWHH7Zmq2VcsCAAAAqiH8mND8AU0Vt3ddvG/VsgAAAIBqCD8mFFDOQ3F7/n9o+gIAALBFCD9mVFQss3YRAAAAQAnCj4n99Epbxe3UrDyrlgUAAADKQvgxsS71Kytuv/rjcauWBQAAAMpC+DGjs0np1i4CAAAAKEH4MYMdk7oobj/KybdqWQAAAEAM4ccMavh7K27vi0+xalkAAABADOHHDDzdXKh2ZR9+e+LaU1RYVDr5IQAAAFgXwo+ZRDUJUtwe9B1WegcAALAVCD9m0rpmRcXtuJsPaVn0FauWBwAAAEog/JhJj0aBovuf7Yy3WlkAAACgFMKPBeUVFlm7CAAAAJKH8GNBOXkIPwAAANaG8GNGvp6uovvZ+YVWKwsAAACUQPgxo8jG4n4/h6+lWq0sAAAAUALhx4wialcS3Z/61xmrlQUAAABKIPyY0cCW1enDAU1F2+5nPLZaeQAAAADhx6ycnZ3opfY1af6zTRTb2i6Ipj+O37JquQAAAKQM4ccCGlX1Fd1H8xcAAID1IPxYgIuzk7WLAAAAAE8g/FhA02p+1i4CAAAAPIHwYwGuqPkBAACwGQg/FuDkhPADAABgKxB+rOTnmBu060KytYsBAAAgOQg/FvLF4HDR/Vkbz9Pon45TYVGx1coEAAAgRQg/FvJseDX6dnirMtvzChF+AAAALAnhx4KimgSV2ZZbgJXeAQAAJBN+9u/fT/369aPg4GDeKXjDhg1an7N3715q2bIleXh4UN26dWnNmjVkz3LzEX4AAAAkE36ys7MpLCyMli9frtP+169fp759+1L37t3p1KlT9Pbbb9Po0aNp+/btZK8eFxRRcbHM2sUAAACQDCeZTGYT37ys5ueff/6hAQMGqN1n2rRptHnzZjp37pxi2+DBg+nRo0e0bds2nV4nIyOD/Pz8KD09nXx9xctOWMLGU0k0ce2pMtvjP+xN7q5ohQQAADD397ddfdvGxMRQZGSkaFtUVBTfrk5eXh4/YcKLtTs+q3L8ZprFywIAACBFdhV+7t27R4GBgaJt7D4LNLm5uSqfs3DhQp4U5ZeQkBCyRc6YCBEAAMAi7Cr8GGLGjBm8ikx+uXXrFtmi2Ouo+QEAALAEuwo/QUFBlJwsnhWZ3Wdtf15eXiqfw0aFsceFF1u0ZGe8tYsAAAAgCXYVfiIiIig6Olq0befOnXy7PfF2d7F2EQAAACTLquEnKyuLD1lnF/lQdnY7MTFR0WQ1YsQIxf7jxo2jhIQEmjp1Kl26dIm+/vpr+uOPP2jSpElkTxY818zaRQAAAJAsq4af48ePU4sWLfiFmTx5Mr89e/Zsfv/u3buKIMTUqlWLD3VntT1sfqDPPvuMVq5cyUd82ZMBLarRyVlP05C24s7XI1bF0qV71h2NBgAA4OhsZp4fS7H2PD/KQqdvFt0PKOdBx2eKh/MDAABIXYZU5/mRggdZeXT9Qba1iwEAAOCwEH5sUPdP91Jadr61iwEAAOCQEH5s1NX7WdYuAgAAgENC+LGyd6MaqNyek19o8bIAAABIAcKPlb3erY7K7SNXH7N4WQAAAKQA4ccGVrP/dngraxcDAABAMhB+bEBUkyCV2yU2CwEAAIBFIPzYsENXU61dBAAAAIeD8GPDHZ9f+uEoFRQVW6U8AAAAjgrhx0a80b2uyu313t9Ky6KvWLw8AAAAjgrhxw58tjPe2kUAAABwGAg/AAAAICkIPzZk6aBwtY/9Hlu6uj0AAAAYDuHHhgxoUY3m9Gus8rEZ689avDwAAACOCOHHxvj7uKt9rOviPXTq1iOLlgcAAMDRIPzYkZupOTRg+SGKu5lG+YUYAg8AAGAIhB87NHBFDE37+4y1iwEAAGCXEH5sjK4rWvxzMsncRQEAAHBICD92LCuv0NpFAAAAsDsIPzamaTU/3feds53OJaWbtTwAAACOBuHHxtStUo461wvQef/vDySYtTwAAACOBuHHBi15Uf1khwAAAGAchB8bVLm8h7WLAAAA4LAQfmxUiL+XtYsAAADgkBB+bNSmCZ1o9ag2VLOSt7WLAgAA4FAQfmxUBW936t6gCo3pXNvaRQEAAHAoCD82zslJy+OWKggAAICDQPixcU6INwAAACaF8GPn9san0KOcfGsXAwAAwG4g/Ni48JAKGh9/lFNA4fN20p7L9y1WJgAAAHuG8GPjGgf76rTfqNXH6Lv918xeHgAAAHuH8GNHhrStofHxBVsuUXpugcXKAwAAYI8QfuyIiw4/rbyCIksUBQAAwG4h/NiRqn7aZ30+i1XeAQAANEL4sQPfDm9FL7SqTq92qqV131d/PG6RMgEAANgrhB87ENUkiD79Xxh5urnQ0kFY8R0AAMAYCD92ZkCLatYuAgAAgF1D+LFD3u4u1i4CAACA3UL4sUNxM5/W+PjAFYctVhYAAAB7g/Bjh7zcXahWgI/ax+NuPuTz/Ry88oAGLD9E++JTSCaTWbSMAAAAtsrV2gUAwwSUc6frD7LVPr7yQAIt232V3355VSxV9HajHZO6UuXyHhYsJQAAgO1BzY+DrvYuDz5yD3MKKGrpfjOXCgAAwPYh/NgpJ0H2Wf96B52ek5aN1d8BAAAQfhwg/LSsUZGOvR9pzeIAAADYDYQfB2n2Ql8eAAAA3SD8OEDNDwAAAOgO4cdOIfwAAAAYBuHHQUd7qXMrLcfkZQEAALAnCD8Sq/m5/TDX1EUBAACwKwg/dmry0/X59cgOoYpt8wc01fo8F2e0lwEAgLRhhmc71aJGRbowL4q83Ut/hN5u2hc8LSgqNnPJAAAAbBtqfuyYMPgwfZtX1fqcYSuPYp0vAACQNIQfB+Lp5kLXF/YhLy01QNvP37NYmQAAAGwNwo+DcXJyov/e6qRxn3G/nKBfjty0WJkAAABsCcKPA6od4KN1n5kbztGJxIcWKQ8AAIAtQfhx0NofXey5dN/sZQEAALA1CD8Stmz3VWsXAQAAwOIQfiQOI78AAEBqEH4cVMOg8jrtV1CE8AMAANKC8OOg1r/egfy83KhpNV+N+91IzS6zLfNxAeXkF5qxdAAAABIOP8uXL6fQ0FDy9PSkdu3aUWxsrMb9ly5dSg0aNCAvLy8KCQmhSZMm0ePHjy1WXnuaADH2/R707xudqGfjQLX79fx8P129n0WPC4r4fXbdbO4OajpnO5rEAADAIVk1/Kxbt44mT55Mc+bMoRMnTlBYWBhFRUXR/fuqRyH99ttvNH36dL7/xYsX6YcffuDHeO+99yxednvg4epCzs5O9N2I1hr3i1yyj577+jC/nfSoZOHTYhmaxAAAwDFZNfwsWbKExowZQ6NGjaLGjRvTN998Q97e3rRq1SqV+x8+fJg6duxIQ4cO5bVFPXv2pCFDhmitLQLtLt7N4NfCQfKFxVgHDAAAHI/Vwk9+fj7FxcVRZGRkaWGcnfn9mJgYlc/p0KEDf4487CQkJNCWLVuoT58+al8nLy+PMjIyRBdQrbhYJpojCDU/AADgiKwWfh48eEBFRUUUGCjuj8Lu37uneu0pVuMzb9486tSpE7m5uVGdOnWoW7duGpu9Fi5cSH5+fooL6yckRXUqa5/1ud9XB8U1P1gBHgAAHJDVOzzrY+/evbRgwQL6+uuveR+h9evX0+bNm2n+/PlqnzNjxgxKT09XXG7dukVS9Oe4Dlr3OX8ngx7lFijuF7KOPwAAAA7G1VovHBAQQC4uLpScnCzazu4HBQWpfM6sWbNo+PDhNHr0aH6/WbNmlJ2dTWPHjqX333+fN5sp8/Dw4Bep8/dxp6+HtaTXfz2hcb8lO+MVtwtQ8wMAAA7IajU/7u7u1KpVK4qOjlZsKy4u5vcjIiJUPicnJ6dMwGEBisGwbO36NKtKw9rV0LjP/vgUxe3Y62mU8bi0JggAAMARWLXZiw1z//777+nHH3/kQ9fHjx/Pa3LY6C9mxIgRvNlKrl+/frRixQpau3YtXb9+nXbu3Mlrg9h2eQgCzTrXC9B538l/nKYBXx0ya3kAAAAk0+zFDBo0iFJSUmj27Nm8k3N4eDht27ZN0Qk6MTFRVNMzc+ZMPhqJXSclJVHlypV58Pnoo4+s+C7si6+Xm177JzzI5rVq8lFg287dpWoVvKlZdT8zlRAAAMC8nGQSay9iQ93ZqC/W+dnXV/PSD46I/bhrzdii13M+HtiMBrWpQeeS0umZZQf5thuL+pqphAAAAOb9/rar0V5gPFaD071BZb2es3DrJUrPKaBrKVlmKxcAAIClIPxIkL5VfY9yCihs3g7KfIzFTgEAwP4h/EiQodP3bDyVZOqiAAAAWBzCjwQZ2s3r2I2HJi8LAACApSH8AAAAgKQg/IBB4pMzrV0EAAAAgyD8SJCqVq8u9fUbATb/vwt8zp8Rq2IpJTPPdIUDAAAwM4QfCZKpGO/VJFi/ORMOXHlA4345wZfDWLDloglLBwAAYF4IPxKkqubHw9Xwj8KDLNT8AACA/UD4kSBV4adSOQ/65dV2JjseAACArUL4kSB3QS3PzL6NqEOdSvS/VtWpU70A8nbXf4HYIkMnDlIxBH/Jznj69/QdkxwPAABAFYQfCZrbvwmF+HvRhwOa0ujOtem3Me3J083F4FqcmIRUevGbGKNHgB29nkZfRl+ht34/adRxAAAANEH4kaBaAT50YOpT9FL7miY7ZuyNNBr3c5zifnGxjPILi/U6RmpWvsnKAwAAoA7CD4g4ORn+3NTs0vDSf/lBavXhTsrNLzJNwQAAAEwE4QdMJj23QHH7XFIGXwj11K1HRg3BBwAAMDWEHzArBBoAALA1CD9gXsg+AABgYxB+wKyQfQAAwNYg/IBJ/XLkpmjen2KZjI/8AgAAsBUIPyAiHOz11lN19X7+zA3nqMsnexT3h/8QS5FL9lFBkX7D3gEAAMwF4QdE/LzcFLeLDFy3IulRruh+woNsOnNb91FfAAAA5oTwAyLfv9yaGlX1pdWj2pApK2uw/hcAANgKV2sXAGxLk2A/2jqxM78dcy3V5Mdnsz4L1xYDAACwNHwLgdkXLGUu3M2gHw/foPozt9LnO+NV7oPaIQAAsATU/IBFws/sjecVt7+IvkKTnq5vsmMDAADoAzU/oFY5j9JsPCA82KplAQAAMBXU/IBaY7vWppO3HlK/5sE0uG0NquDtTmsO37B2sQAAAIyCmh9Qy9fTjX4d3Z4HH0bYVFWtgpfZX58Nj09IyTL76wAAgLQg/IBecwC90Ko61Q7woZ5NAo061r+n75TZJuxhdD/jMfX/6hA99dk+o14HAABAGcIP6OXT/4VR9JSu5OXmYtRx3vr9JB/2Lrfp9B2+Te5mWo5RxwcAAFAH4Qf05uTkRMMjahp9nJ0Xkunq/Ux++01B8AEAADAnhB8wSFU/4/v8vPHbCYpcsl/rnD/741PKLJkBAABgKIz2Aps3YlUsv76xqK+1iwIAAA4ANT9gtA51Kpn8mDJM9wwAAGaCmh8w2K7JXelaShZ1rV+ZGs7aZtJjy9QEItbfCAAAwBio+QGD1a1SjqKaBJGnkSO/zt5OL7NNVcXPlD9PG/U6AAAADMIPWF2/rw6W2SZTUfez/kSShUoEAACODOEHbNLD7AJrFwEAABwUwg+YxDcvtSRPN/HHqbynq1HD4AEAAMwB4QdMolfTqnT+g16ibcuHtrRaeQAAANRB+AGTcXEWj8Ryd8XHCwAAbA++ncAswqr7kTOGpQMAgA1C+AGT2vNON5rRuyH9PrY9uQg+XQuea2bW1y0uLh0dlvm4gLLyCs36egAAYL8QfsCkagX40Gtd65C3O+vsXFrzM6hNCNXw9zbLa07+4xR1WbyHsvMK+UrxzebuoKZztlORIBABAADIIfyA2Qi7ALH+QLundDX6mI9y8lXO/3P7YS5tOn2H0rJLH8/OR+0PAACYKPz8+OOPtHnzZsX9qVOnUoUKFahDhw508+ZNQw4JDqh2QDnRfVdhO5iBwuftpKMJqSofy8kvEk2OiB5HAACgikHfRgsWLCAvLy9+OyYmhpYvX06ffPIJBQQE0KRJkww5JDggP283OjT9KYqbGWnS48759zy/LiwqpkVbLym25+QXipbFwDpgAACgikGz0N26dYvq1q3Lb2/YsIEGDhxIY8eOpY4dO1K3bt0MOSQ4qGoVSkKyKRUUFdP28/fotZ/jRNtZFx/08gEAALPU/JQrV45SU0uaHnbs2EFPP/00v+3p6Um5ubmGHBJAZzdTc+jjbaU1PuoWQ0W9DwAAmCz8sLAzevRofomPj6c+ffrw7efPn6fQ0FBDDgkSsfGNjkYfo7BYRrfScrTuh1ogAAAwWfhhfXwiIiIoJSWF/v77b6pUqRLfHhcXR0OGDDHkkCARYSEV6MaivkYfp6BIdbSRCap/hLcBAACM6vPDRnZ99dVXZbZ/8MEHhhwOJOif1zvQc18fNutrIPoAAIDJan62bdtGBw8eFNUEhYeH09ChQ+nhw4eGHBIkpkWNivRqp1omP66wsgcVPwAAYLLw8+6771JGRga/ffbsWZoyZQrv93P9+nWaPHmyIYcECRrd2fThRwThBwAATNXsxUJO48aN+W3W5+eZZ57hc/+cOHFC0fkZQJuqfqYdBv/5rnjqHx5s0mMCAIDjMajmx93dnXJySkbb7Nq1i3r27Mlv+/v7K2qEAKzhr7hbitvC2Z4BAACMqvnp1KkTb95ikxrGxsbSunXr+HY27L169eqGHBLAJM4llYZv9PkBAACT1fywkV6urq70119/0YoVK6hatWp8+9atW6lXr16GHBIkqpKPu0mPty8+RXFbOfuwoe+vrDlGQ747gmHwAAASZlDNT40aNei///4rs/3zzz83RZlAQsJDKlD0pftmObZywGELn+5+8lp30h+bZekNAABw0PDDFBUV8XW9Ll68yO83adKE+vfvTy4uLqYsH4DBlOt2igRhaP6mC7TipZZY/BQAQIIMava6evUqNWrUiEaMGEHr16/nl5deeokHoGvXrpm+lOCw+javyq+rVzR9LYxyy1YxW/n0iW3n7yn6B91MzaaLd9FRHwBAKgwKP2+99RbVqVOHr+7OhrezS2JiItWqVYs/pg82QSJbD4wtitquXTvegVqTR48e0RtvvEFVq1YlDw8Pql+/Pm3ZssWQtwE24LkW1Wjt2Pa0+c3OJj/2kYRUHmzkTWDjfzkhenzXxWR+3XXxXur9xQFKzcozeRkAAMBBmr327dtHR44c4UPb5dj6XosWLeIjwHTFRomxUWPffPMNDz5Lly6lqKgounz5MlWpUqXM/vn5+XxRVfYY62zNOlrfvHmTL7cB9ok1O7WvXbI2nFxkoyq066Lx/YDe/P0kv/59THtyd3WmmIRU0eNfRF+hSU/XV9y//TCXKpXzMPp1AQDAAcMPq3HJzMwssz0rK4vPAaSrJUuW0JgxY2jUqFH8PgtBmzdvplWrVtH06dPL7M+2p6Wl0eHDh8nNzY1v07aKfF5eHr/IYR4i23XugyjKzS+ixdsvmfS4Q74/olOnaHn3n+y8QioskpGfd8lnTLnpzNkZ/YQAACTX7MVmdB47diwdPXqUf3mwC6sJGjduHO/0rAtWi8NWgY+MjCwtjLMzvx8TE6PyOf/++y9fTZ41ewUGBlLTpk35zNKs87U6CxcuJD8/P8UlJCTEgHcMllDOw5Uql7dszcvPR26K7rPPcpM52yls3g4exITWxiZS07nb6ahSDRIAAEgg/Hz55Ze8zw8LIqyvDrt06NCB6taty5uudPHgwQMeWliIEWL37927p/I5CQkJvLmLPY/185k1axZ99tln9OGHH6p9nRkzZlB6erriwvopgf1qXt3PpMebvfG86L6gTzQlppXMYi43ff1ZPlz+jd9KmtMAAEBCzV6sj83GjRv5qC/5UHc2+ouFH3MqLi7m/X2+++47PqS+VatWlJSURIsXL6Y5c+aobaJjF7Afb0fW5/PxDG1Xk7o3qEybTt+lVYeu88eKzTw5ofD46pfHwASJAACSCD/aVmvfs2ePqC+PNgEBATzAJCeXjLiRY/eDgoJUPoeN8GJ9fYRzCbHQxWqKWDOaPv2NwHYFV/CiY+9HKubgufFkxBZTXGy+1/077jZ1qBugdT9MDg0AIJHwc/KkblX9uk4ax4IKq7mJjo6mAQMGKGp22P0JEyaofA4bSfbbb7/x/Vj/IPl6YiwUIfg4FuHnSBh4AlifoLvmec0fY27yi9z5pAzadSGZRneuTZ5upYEb2QcAQCLhR1izYyqsNunll1+m1q1bU9u2bXl/oezsbMXoLzaJIhvOzjotM+PHj+frik2cOJHefPNNunLlCu/wrO/cQmBfhGFjwXNN6bmvD1NKpvnn5Jny52lFP6C3etRTbDd30xsAANjo8hamMGjQIEpJSaHZs2fzpqvw8HDatm2bohM0mzhRXsPDsJFa27dvp0mTJlHz5s15MGJBaNq0aVZ8F2BuwrBRvaI3HZzWnRrM3Gax1z+XlC66j+wDAGDfrBp+GNbEpa6Za+/evWW2sRFmbFg9SIhS2PBwdaFlQ1ooJjG08MtjRXgAACkOdQewJFWjrjxcLffRVQ47iD4AAPYN4QdsXkVv63ZmF879wyH9AADYNas3ewFoE9kokEZ2CKWwENNOcKhPzc/jgtLZnpF9AADsG2p+wOaxtbTm9m9Cz7WorvLxuJmlS6SYQ+z1NGo4q7SDdVZeIf2itCxGWnY++gIBANgJhB+we+ZeiT1baY0vZuaGc3T8Rhq/vefSfWo5fyfNWH9W7THuPMqlgiIzztAIAAA6Q/gBu9SxbgCV93CltqH+VivD9QclM08v2RnPr9ceU71uXNzNNOqwaDf97xvVC/YCAIBlIfyAXfLxcKW4WU/Tutfa8/urR7axeBl+OFiy3pg2656EolO3Hpm5RAAAoAuEH7Bb7q7OimUwalf2sfjrX7qXya+VV3RJzcrjTWCnEXYAAGwSwg84BGcd15SzBNYf6PfYRHp2+SFrFwUAAFRA+AGHGRFmLcqDvOKTS2qEAADANiH8gEOwVvb5MvoKnVVa++taSklHaIY1f+29nGKFkgEAgDoIP+BwzV7/vdmpzONtQiua5XXlI73UYc1f9y2wAj0AAOgOMzyDQxB2+alS3oO+HNKCjiak0ux+jSnuxkNqUaMiNZptuZXgAQDAdiH8gENwEaQfNgKsf1gwvzAd6gaQVLCRZqwLUoCZJ34EALBnCD/gEORD3ktuW68c1hzezmaQbvXhLn47/sPefCoAAAAoC38dwSHYyrpaxgxvf5idT1fvZxn8/IzcAsXtR7n5Bh8HAMDRIfyAw1FX8TP7mcaK2wPCS5rErOFWWg59uv0ypSh1hG4xfydFLtlHN54sm8HM3niOxvx0XKdwJ6z9AgAA9RB+wOGoiwmvdKplE5MivvDNYfpqz1V66/eTim3yRVL57ZsPFbd/irlJOy8k04W7GVqPm1+IhVMBAHSB8AMOwdu9tPuat7uLTU+KmJxRUuMTKwg8/525q7j9zp+naf5/F6i4uDTGKVf8pGXn09Jd8bwWSe6LaMGwe9toBQQAsEkIP+AQvNxd6LfR7ejX0e1EQUgda2WfZnO2K24XFcso43FJP53C4uIyi6YWCsKPck3Vu3+epqW7rtDzKw4rtm06XRqgAABAPYQfcBhsSHtHHYe1u1gp/WTmFYruT/itpOlLkHMUjiSkqi2v/DHlfkMAAKAdwg9IEpv00Bbsjy9Z+kLYxCV3M7W047OL0m+qsCYo6VEuFRYVizpFo9ULAEA9zPMDknJkRg9Kzngs2taoqi/l5BfSzdTS/jOWll9UtrOyMMAoN3sJ73ZctJs61wsQ7W8jI/8BAGwSan5AUoL8PCkspIJo29aJnenv8R1ocJsQq5Vr/YmkMtuEASYrr5B3clbXDHbgygMqRuIBANAJwg/Ak+UgpvVqaJXX/ivutsrtwmas/l8dopbzd/IQpG6o/uOC0tojGRq+AADUQvgBeKKijzt98kJzi78uG9quiqr4Ip8AUduEhqo6UAMAQAmEH5CkxsG+VLuyD3WoU0m0/cXWITS8fU2Nz+3RsApZwtZz99Q+pm2wmqoO1AAAUAIdnkGS3FycadekrioXQZ3RpyH9fOSm2udaqm9N7PXSSRDl5C9tzRmqAQDsHWp+QLLYLM+qmo/YJIkX5kXRlrc6q3yeNStV5H15tM1ThM7PAADqIfwAqMACkKdb6a/HzL6NFLeVY8X03g1p04ROFinX5D9OU25+kcoaKyG0egEAqIfwA6BGrQAf3r9nYMvq9L/WpcPglVdY71Q3gCr6uFmkTFfvZ9GqQ9e1NntpqvlZsjOeFmy5qPW1ztx+RDM3nBUNsQcAcAQIPwBqsCaxH0a2oc9eDCNXQTOTsKnsi8Hh1LSaHyktzWVWX+y6orXD82c7LvPrh9n51P3TvYr7eYVF9GX0FfpufwLdSxdP9qiMDa//5Ugizd54znSFBwCwAQg/ADrw8XClMZ1r0cgOoeTrWTpO4Nnwavza20P7SvKmnA1avjK8OlvOlowU++P4Lbr+IJuW7b6qWExVTnkxVTkWmLou3qO4H5+caaKSAwDYBoz2AtDR+30b8+sJv51QOUmiJeUWFOm0n6ebi9q+QOrmCmIryltzqQ8AAHNDzQ+AnqZGNaT6geXKTIgYrrRshq2MaBPKLyyt7WEPsfmATt96RI8FYUp5nTEMHAMAR4OaHwA91ajkTTsmdS2zXdikZAtWHkigDzeLOzb/FHNDcduJnHjnabZP1/qV6cdX2trk+wAAMDXU/ACYSAVvy4z40pVy8GGEzVms1WvN4ZIwtC8+RdEhmjV7mQoLUuyYAAC2BOEHwEQWPNeMbJ2wVofV+tx+mCt6/KWVR8s8x5h6oL5fHqDmc3dQTn7JgqxCcTcfUp8vDtCRhFQjXgEAQH8IPwAmEuLvTdcW9CFb9en2y6L5f77dl1Bmn2M3Hhr9OizMsPmB2Ar0l+5lUl5hMZ269ajMfoO/i6ELdzNo8HdHjH5NAAB9oM8PgAmxZSfejWpAm07foYQH2aIOxtb21Z6r1KdZkN7PU57UURt5mPESjDRj/YuUFRShbxEAWAdqfgBM7I3udWnb212ooo31AWLuapnYUFdsdJhwhJgqNzBcHgBsFGp+AMxEVW2HtZ1MLNv8ZEi/obAPdvAh8GwBWFcX1f+GwhB5ALBVqPkBMBNti4/aml0XklVuv5aSTUmPSjtGp+cW8H48bD6gR7kFJmsuAwCwFIQfADOxs+zD+wSp0/WTPSpHjLHFTy2xmOsdQfgCADAWwg+AmQiXj5j3bBN+PbFHPcU2NrHgjN4NKbJRFbIFysPehQoFgUdYo/PKmuMqh7Hz/QS32Vw/bM2wyetO0eGrD3QuE1tRPnLJPuqwaDeZAmqjAIBB+AGwQLPXiIhQin2vB70dWU/0+Gtd69DUXg3JFjzI0rxYqlyRUoBYsiOeh4o3fz+pWD1eOWjM3HCOFmy5SOtPJtFQFXMJqXMjNZtM5dK9DGq/MJp+j00UbWcj8pbsjKe4m2kmey0AsG0IPwAW6vNTxddT7WKi9iDzcUn/HuXVL84mpfMJC9nwfvnq8YxMqVZJU82SJWpqpv51hpIz8mjG+rNllvz4MvoKDVwRY7LXAgDbhvADYCbOWoKO/HvdXuIQWxqD1ZKwxVCVPS4oO5/R3sslS2bIORvw18aUy4ypm3OJ9SkCAGlB+AEwk6WDwqmchyvNH9BU437qhorbmrn/nqemc7aX6XwsM8HQfzaa7NiNss1OqoIWAICxMM8PgJm0qFGRzszpSc7Omut2Qit5U99mVcnXy61MfxRbcvxmydIXqw+Vrgwvb5pi/xmj45MOzZsmdKJm1f0U29VlH9aB+r/Td6lTvQAK9PU06rUBQHrs45+cAHZKVfBpVNWXXz/fshq/Zv2Alg9rSQufb1ZmqQxb9FhplfZzSRkUc0374qTJGdpnlz55q3RtsVtpObyTsipf7b5KU/48TX2/PEi6Utd9yI67YQGAgVDzA2Bhf4+PoGv3s6lptZIQJDS+Wx3678wd+umVdlSlvAc1mbOdbM1ppUVKcwuK6Ou914zuA6Wss2BuIWVrntQ+6TpCjTG2dgoAHAfCD4CFebu7ipp2hKb1akhToxqoHRVWw9+bEtOsu2bWwxz1szproq35TxvWvMbOy4ErKZSZp3puIQAAXaDZC8DGaBoOHxrgQ/bq4l3VTVjK2AzSRxJS1TZbGdovypzzG8qnAQAA+4DwA2DDqlXwEt1nsSignAc5spUHEmjwd0fKbJcZuWCsubLP4u2XqNncHbRTzdpoAGB7EH4AbNimNzvR6pFtRF/gv41pR/Zu2Mqy4YbJziuidcduqXzsXFI6dV28h7acu6vXa3277xpNWndKw7B545rjlu8p6e/0wabzZO+w/AdIBcIPgA3z93Gn7g3Fa3/Z6CAwvRy6qnp02MfbLlHCA9VLWoz7JY5PtKj8/cy+sNfGJtIppY7Ycgu3XqJ/TiaJjvvd/tIO2hjtVeKfk7ep7YLoMh3aARwRwg+AnbHnJTKMcTdd9VD5/Vce0PT1Z2nA8kM6H2vBlktGl4fPb+RANSWT1p2mlMw8ev3XE9YuCoDZYbQXgB0p+cK1dilsy5XkTKv8HF764SjvnO1oCotVLwMC4EgQfgDsSK0AHwryw4zGQtYIgxm5hWWa7iRaIQdgl2yi2Wv58uUUGhpKnp6e1K5dO4qNjdXpeWvXruVNAAMGDDB7GQGs6Y/XImhYuxr0TlQDvl4YGD95oXyCREMyS5GKxHUrLZevbA8Ats/q4WfdunU0efJkmjNnDp04cYLCwsIoKiqK7t+/r/F5N27coHfeeYc6d+5ssbICWEvbWv700XPNyNfTTefnvNqpFknN3XTxoquabD6jftQYa84a/sNRmr3xXJnHCoqK6ekl+1Q+783fT9LjAvHyHwBge6wefpYsWUJjxoyhUaNGUePGjembb74hb29vWrVqldrnFBUV0bBhw+iDDz6g2rVrW7S8ALZgUmR9fj2uax3R9va1/RW3ezQSjxJzVMJKmJ9jbur8vDn/nqeElCyVj51IfEgHrjygn1Qc7/K9TErNzld73Pwi++4zY+g8SgD2xKrhJz8/n+Li4igyMrK0QM7O/H5MTIza582bN4+qVKlCr776qtbXyMvLo4yMDNEFwN5NjKxHp+f0pP5hwYptXw1tQd++1Jqk/iV29nY6DfnuCN1QM2ReKEbFTNLy2h11irV0MkKHdADbZ9Xw8+DBA16LExgYKNrO7t+7d0/lcw4ePEg//PADff/99zq9xsKFC8nPz09xCQkJMUnZAazNz0vcBNa3WVUq7+kquQ64y/dcFb3nfl8d5KGm26d7NYYYVT7fGU9Ld8WLpoN+eVUs7b1836hw40hD4gEcgdWbvfSRmZlJw4cP58EnICBAp+fMmDGD0tPTFZdbt1TPHgtgj8Rhx0kUeCSSfSjjcekipxtOijscd1i0W69jfRF9hZbuuiI65r74FBq5+pjONT/Kztx+ROHzdtIvR3RvkjOVrLxC+ivuNqUbuBgtgKOy6rARFmBcXFwoOVm8Jg67HxQUVGb/a9eu8Y7O/fr1U2wrfjInhaurK12+fJnq1BH3gfDw8OAXAEcU4u9NM/s2It8ntUDCCRDZKuorhrWk66nZ9Mm2y6LnNQgsT5efzI/TrpY/Hb2eRo4g6ZG4wzObtE8T1jSoqoZM1Vw38cmZVD+wPOkztQ+reer/VcnkizM3nKOX2tfU+blrDl2nuMRH9PmLYeTqYti/U6f+dZq2nL3HO8yzEYO6kEqNIUibVWt+3N3dqVWrVhQdHS0KM+x+RETZX9SGDRvS2bNn6dSpU4pL//79qXv37vw2mrRAikZ3rk0vti772WffYb2bVaXXu9WlIzN6UA1/b5XDw9e9FkF73ulGUqTui56tBaas5+f7dWvCEjy88ZThQ9/nbrrAh85vO6+6C4AuWPBhYh0k3AI4TLMXG+bOmrF+/PFHunjxIo0fP56ys7P56C9mxIgRvOmKYfMANW3aVHSpUKEClS9fnt9mYQoAyn6xs4kRu9QvbSpW/v5mkyfeWNSX5j3bhKRk1oayQ9mZgiL1AUdbxc93B0rXDUvPNb65KScPQ+cBHC78DBo0iD799FOaPXs2hYeH8xqcbdu2KTpBJyYm0t27+q3iDABl1wATBh51X+AjIkIp9v0eJBWFxTL650SSXs9Rvzq8eJV3a3d0Zs10mtxMzaZxP8dhIVOQJJuYKnbChAn8osrevXs1PnfNmjVmKhWAfVNu0RF+Z2v6Uq5S3pPWjGoj6uTryLLzda9ZYYFB1ezO1sZmq1596Dpv/qxZyYfOJaXTM8sOanzOaz/H0aV7mbxZjdX6yaHLD0iB1Wt+AMA8lJfBENZYaPv67tagCjWu6mumktmvrov30tDvj5rseL8dTaS5/543uoaI9VFiNU4DVxxWjFDT5kaq6nmQ7qQ/ppUHEujq/Uz6+chNytUjHALYC4QfAAcz+5nGNL5bHaoXWF60XVRjocN37TtRJbNID21XQ7Sd1RL8r1V1fntuv8YmKbNUCOcLYt775yytOXyDwj7YwWeVVkmHqhj5aL0HWepnni57WPUH/nDzRYpcsp/3ifp42yWdjwlgLxB+ABzMK51q0bReDVWuV6WPpxoG0slZT9NHA5rSOz3rU9tQf7r8YS/+2OL/hfEQ1KSan8nK7YiUK3TUNSWyeYWe/7qk1sZSdB3SrkstkiarDl6n134+rveEk4b6bMdl+mjzBYu8FtgvhB8AiRCGH11jUEUfd95xesJT9eiPcRHk4epitvJBqQ82nVfcztAwYiwxNUdls5QuwUbXvj3y/fIKDWv+mvffBdp+PtkiK96zRWWX7b5K3x+4TvfSH5v99cB+IfwASISw2ctXMDO0MfTtqvJci2okJcL5lPSx+tANUROUupmjuyzeQz2XileY33r2Ll26q3mklz5YkJr/3wVqMHMbnb+TbvBxsvNKZ802F+Hs25aqaQL7hPADIBHCDs+fvRhOTYJ96ZuXWprs+NUqeCluuzirrleIbCRex89RsUVVWf+etbHql9MxtiPxt/sT+PWttFxRVd74X0/QvzrUsihPhaBpvx8OXue3P995xdDiGhgDARx4qDsAWLbZq26VcrT5rc5GH7Nh1ZJO1R6uzrTutfbU6eM9/H7X+pVp9yVx515D1sWyV2xRVXXYyK7cgiJqPHu7wcdn64RtPlM6/1m+AbUcujZ7iXOsTGO4ZkuqANgD1PwASERk45JaF38f082E7uvpRqdmP02nZvckZ0FNwmtdait9aUor/Ggy4beTdOFOhtrH72dq76vy6Q7xWm0GcdJ/VJi6H9/EtSep8yd7KCdffdMWfvRgS1DzAyARL7SsTpXLe1AzE4/QquBdEqbSc0u/JEMDfOjS/N78i/xE4iM6kpBKA8KrIfwQ0eazd2lUx1C1j/f4TNyHh5nyx2n6+8Rt6lwvgBa/EGaScujc4Vmwo7qfn3wNs50XkunZcNX9um4/zCF7wd5HVT9PaorRjA4LNT8AEsGaJLo3qEIB5TzMcnzhl6SrsxO5uzpT9Yre1D8smBY814yvLM5WkGer0Evdoxz1I7gyH5etPWHBhzlw5QG9+G2MQbMwv/5rnKgJNK9Qt6YyXfsGaduXjcAyN1Nka9ape8xPx7XOkA32DeEHAExCWCvg6uys9suRrUIvFFG7EvVtVpWk5PNd8Sq3Rz1ZOV6TxLQceqghPGla4T3zccnz+n55QPfwI7hty/V2l+5l0Df7StdVM9S1FNUzX4NjQbMXAJi8Q7Wri+61BXWq+NCHA5rR7IzH1G5BNEnBeTV9fi5rWYzUWPJ+WWxNL11duJuhc82KNbs791p6wCTHsecu2ywABlfw4n3xQDPU/ACASQi/GNUNddck0NfTtAWCMlj22X0p2eDnm6Pmh41++znmBp+3SJ/JDE8mPhRN3yB1x26k8QDYbbHmxcChBMIPAJhEpXKlo8jcXHT/0yIcTbRpQid6NjxY9LiqBVZ/H9Pe4HJK3Strjpvt2Mpdfgp1GILPOoDP2nie+n91SOfXYX1ynvv6MK06ZPp+RHp0cbIpO87f49dp2bqv7yZlCD8AYBLe7q60551udGBqd601P2w/uY51AxS3m1X3oyUvhlOQoBZo5jONjF6nDEoYe9q0rT4vDLJvrz1JLefv1HrMy1qa4FjtDqvVEE4KyTp+Mz/F3CRT07TgKzgO9PkBAJOpFeCj835HZvTgfRTYhIhCLDgdmNad7jzKpTuPHlNEnUoUVt2PTt8uXVqhXmA5k5ddiqvKy+Xr2PlZl1qTjaeS6OeYm3T8pppV6jVgHbJjrqVS1waVFevIrTyYQAu2XKL2tf1p7dgIsmUslLPcr88IOVNg665hNQ/9oOYHAKwiyM+TujWoovKLgjWb1azkw4MP89Or7aich6vW/kEd65bsD+onWFSl5+dl5xZShVX8sFC6Pz5FbS3QxLWn9Ao+wp/+2J/iaOzPcfSRYD2z344m8usjCWlly2OGXkiG5hYWILt9uocGf3eELIlNLNls7g6zNAE6MoQfALB5fl5u1LtpkGibl1vZFeaHtq1pwVI5jhupuk1AyJbl6LBoN41YFUv74lP4JJbCEKQtN/wdVzJfkToxCan8et0x9WuiCRUXlzSLXRSMSDOWoXU2Z5PS+TprR6+XDWlskdXxv8TRj4dLF6w1tolR7mTiI5PV3EkJwg8A2IUpPRtQ7co+NOuZxmr/hd6nWRD98HJru+20auviBDU6I1cfo7YfRdP7G87p/Pwpf57WqUOurvU5SY9yqfGcbdT7C92GuV9LyVJ0DFZH+bPDmpQ2nEyih0Z0JGYzYG89d4/m/Htep/0nrzvF3xNCjfkg/ACA3TST7Z7SjV7tVIvfVx4VxrAmtB6NAqmqmmax6b0bmr2cUiNvlmJ0CZ3ZeerX/zKkVupxge4BgS0dwprVDl8r6TCtmlOZpsK3152i9zecJUNlPZlcUlfrTybxuZgOXdVUTjAGwg8A2KXZzzShJS/qt87V008WdwVzcdJpJnBWozH1r9N8fh9NiSnuZppZRvmdTyptJmNlyRIEMmFxWNMTW+dLPkO2Pthz2YKvn++MN7xnEmowzQbhBwDskpe7Cz3fsjpfQ0yZui8bdz3mHwL9sSHp2rCamvozt9Ifx2/z+X2+jL6idt8h3x8V3V+x96rOZdHUZUYYcLou3kNN52ynDBW1M8ZkLdZEyJq7vtDw/radu0uRS/bxUY+aZuQG08NfAgCwb3p8QYX4e9PIDupXVAfj/HBQ+4ij0T8d0/l4yn1eftRjXh9NI8GEoeJu+mN+fSrxUZnQpG4Ve9HryGT0U8wNPuO0cudwbcb9coKu3s+iN349ofJxXaIP4pFhEH4AwK6V83TV+oXAJk386Lmm/Pbc/k3UHmuAin5EYFpsRJShtUjGfNHfzygJOeqCiTzojPslTq9mNtYsNnvjefpQMDyfEeYmNneRXGpWXpljCCdwFLJExU9RsYxOJD7kHbvVhTtdzgObo4ktO2IvEH4AwK6tHtmG6lT2oVUjWyu2Kf+pPvJeDxrWTvMw+NY1K9LSwS3o1OynzVRS0Mfba08ZFQZY+Dh89QF9sesK79MjDDWLt19Wub8yXWp+rqZkqTiWuN5px5N+Q0yrD3fRZzvKvr61mr2W7b5Cz399mCatK3u+5R2+O328m88npG2uoTYf7iJ7gfADAHYtLKQCRU/pRk81LO3MLK/l0Yf8y6qCd+kaZWAlMqKM3AKVYeDIk7mAdDgEDV15lD7fFc/79Jx40qwltFowMaCqoFOoosaD1SCxIfby+Xs+2aY6SGmap2fZ7quUnlP6/u48aXrj5RC8piWatL7fn6CxQzdbe401DUZfVD07OHMluSQAZuo5ks+asLwFADgcFoQuzIuiv+JuU1j1CtYuDhhAVRhhX8K6zqCc9FBz81piag59sOmC4r6qrFJUJN7IgknbBdH89rReDcnLTXX9gUwwYaM6YfN2qNxeJCgIC28b3+jIA76pnUtKp+oVvVQ+xprAlu66Qt2Ulp5Rh4VAe4OaHwBw2IVWR0SEGvTF4aFiBJkQa2YD88kvKjZ64YqxP2tevb7L4j1aw5YwiDAfbCqdpPDjbZdoriA8Ca06eJ2+3VdSo6Iv5XK8skb3DuLKWB8cNhu3ckf04zfS6JllB6njot0ql5dZc+gGrdh7jQYJgua5O6Vr66n6eek7M7W1IfwAgOTM7NuIL4/xxeBwlY+ve039ApoTe9TTOIwaTMPYc5yjphOxOmzyw02n74i2CZug9Bltpmn4vi5LdgilZueXCRSsA/L5O+llyqfsz+O3+Dps8/8Th7S9l1P4dXZ+kWiOI7kbqdlltimHuX9O3qazTxYbLhTUkJloKiazQ/gBAMkZ3bk2XZzfi54Nr6bY1iTYV3E7XE1t0ft9GtFbPerp1BEWjKPLUHFTe/N38cKvU/8+Y9BxVPUV0pVybZOwPw5rqmPD6qf/fYb6fnmQlu6K13isrDzV59BZa2cizTuw0WuT1p2mfl8dLCmz4P3ay+8Gwg8ASNqWtzrThO51aWov9Utf9GhYhcZ0rkWjO9ciF2cnnZpk1r/ewaTlBMuT15Doy5CZqOVLbqh6Lls0lWGrxrNh9X8+WSD2y926T/oo5Kwm/bB+QIy2QWZX7meqfQzhBwDADjQO9qV3ohpQOQ/V4z/+e7MT/TCyDb3ft7Gif8QAQY2ROi1rVOTPbVzVl356pa3Jyw22q1C57UqPpjJVTVnyiclVZSr5JI1C644l0oErKXoPoX/x2xidylrm2aIlQcguIPwAAKhw7P1I2jShEzWt5lfmsQlP1eWrx6vDaocY9twtEztTl/qV6cDU7mYtL9gOY/q96FtzMuXP02W2Tfv7LA3/IVZUg8PWUWOdp1knaGcnzf2ktDaLKYUn4T22TIiqiRxtDcIPAIAKlct7ULPqZYMP4+bizFePl2Odpze80VFxv01oRZVLa0TUrqTz6wuPB9Khqs/P8j3XKHT6ZqOOy9ZR233pPv12NFHlCC859jpOes4wJDxe24+i+USOqtZKsyUIPwAABvp3QkfeH+jw9Kd4J+m973Sj17rUpi+HtDDquLUDfNR2ugbHJM88BrSY6eVRTr7WmaMv3lW90Kp8GQ5dotG1+2VnvrYlCD8AAAZqXr0C7w9U0adkVujQAB+a0acRVSnvadRxsZi39By9nkYLt1xUWfNjyuH+bKmNj7dd0vi8zMeqZ2puNHsb/Xq07HB/VR9XW+/6g/ADAGAFHeqIm8Cq+pUGJk3NEuC4vt2fIFr2wliq5hu6dE/9SC25y8nq93n/n3OicM7mIFL1cbX1js8IPwAAFtKoaulcQs+3rC567J2eDRS35d8lrMO1u3yoj5LBbULMVEqwpj5fHiBb5ySo69l05q7KPkKqJk+0JQg/AAAWMqVnfRrfrQ4fAq/s+Zalw+flfTJYh+vLH/bina/lOtcL4DNUz3qmsYVKDaDemVtlF4xlPtHStGZtWNgUAMBCfDxc+YKYqpofhE1dFX3cRNuFQ4/ZYpRshmoAa3FyEg/rl6no4ROvoenMFqDmBwDACmoFeJfZ9t3wVtS6ZkVa/EKYaLs+Q4/Z84WEtUaW4OnmTCdnPW3R1wTrSc8toEt3ywYdW1/jCzU/AABW0KqmPy1+oTk9yMqngU+avHo2CeIXZbr2fx7VMZSm925IS3bG88Um3+helyp4uVHt97aofQ6bk+jYjYdkKm8+VY+PfhvXtQ59s++ayY4LtsNJcPvvEyVLbZhiiQ9LQvgBALCS/7XWrdPygueb0ajVx7SOomkQWJ48XF1oRu9GWo8Z2agK3X6YS8uGtKT2C6N1LzRI3vT1Z8neIfwAANi47g2qKG5rCj8RSsPnlbH1y+SjcFa+3Ebn13d3dabQSt4Un2zbE9cB6Ap9fgAA7BybDXrfu92oZiUflY8fmdGDNr/ViWr4l+1npEmvJkF8uH3czEgypSFtMUwfrAvhBwDAzgWU81AbfJggP09qEuyn98zRbEFWNty+vGfp6DMmqknpumaG6Fq/tCZLG7ZuGoCpIfwAANg5D1fd/pT7P1mGQ1c5+YUqR5x9O1z9ivaNn0zk6OPuorH5TVen5/SkPs3KdgIHMAbCDwCAHRHOqTJ/QFOqXdmH3uurvYMzs+C5Znx017fDW+m0f6d6AYrbutQaubk4UbcGlfntAS1KJ20UalbNj9rV9id9+huN71pX5/0BdIEOzwAAdmBAeDBtOHVHNMHh8PY1+UVXIf7e9Oe4Dlr3OzO3J6Vk5lGdyuXU7rN6VBvFCDTFtpFtFZM1stc6Metpajl/p+LxS/N7kacBzVhNq5UuCwJgCgg/AAB24PNB4XzIu7e7+f9s+3q68YtyDYzyCLSYGU/RtnP36JnmwfQoJ5/qBZYv08zWtpY/xV5Po5fa1zAo+DBY6BVMDeEHAMAOsABgieAzW82aYZ+80JzX9EyKrK/YVtXPi0Z1rKVxJunvh7emA1dTKLKRcZ2k9cHKwmquANRB+AEAkDgWaLacvUs/vdqWAn09Ve7TMMiXYmb00PvYft5uvGZIF3+Pj+Cj1ib8doKOJKTp+JwOVC+wHJX3cKVaM7YoRr95u7vQzdQcvcsL0oAOzwAAEjcxsh5tn9RFbfCx5JIfLLhEqVjiQ9nYLrVp3dj21KpmRd5EJ2waY7f+fC2CL/UBoArCDwAA2J1eTYOoXW3VM1qzHFTF15OGtKlh8XKBfUD4AQAAm6JL92Y3Z/VfX6zJiynWsBaIqWetBtU/A1uF8AMAABb30ytt+QiwhkHiEWJMn+ZV+XV7NfMBsc7Tqoa/Lx0UTnWrlKNFA5vz+5rWFa9UTnUHbTCNNqG6z+VkDejwDAAAFseWzmCXAcsPlXmsSnlPujAvijxdS2sPfh3djqb8cZqm9KxP/2utem0wNrGicHLFCl5u1KiqL128m2GmdwHqaAqetgA1PwAAYHPYsH5n59IGsI51A+jIez3UBh9V2PM3v9mJ7MXgNprf23+C96Kqxgx0h/ADAAAOSxigzM3YQKJtMseKgrXZbL1ZydYh/AAAgNW89GR5jrZW/jLv+6SfkVzCgj40S82Ej+pseKOj4na1Cl56l6GCt3hWbWXCaKRPpnu+ZTXaPaWr3uVxZDYRfpYvX06hoaHk6elJ7dq1o9jYWLX7fv/999S5c2eqWLEiv0RGRmrcHwAAbNfAltVoy1ud+QSL1jLv2Sa0fGjLMjVGr3Yqmb1aV4Yu3yGnLc8IK4Z0XfJjzag2tOTFcKoV4EOWJNMw0s4WWD38rFu3jiZPnkxz5syhEydOUFhYGEVFRdH9+/dV7r93714aMmQI7dmzh2JiYigkJIR69uxJSUlJFi87AAAYh32JNw72NTo4GCM8pILBz7XksmNOOk0CoPScJwXE+mg2Fn6WLFlCY8aMoVGjRlHjxo3pm2++IW9vb1q1apXK/X/99Vd6/fXXKTw8nBo2bEgrV66k4uJiio6OtnjZAQDAfh2Y2p1+G9OOmlc3PPyoo2EaIq56xbLNYsp1JZV83Kl+YDnFfWF+cdYxzFgr8shsu+LHuuEnPz+f4uLieNOVokDOzvw+q9XRRU5ODhUUFJC/v+r24ry8PMrIyBBdAABAOl7vVodqV/YhdxfxV16Ivzd1qBOguC/vdxToWzoHEGuSM0SdyqWhRVm/sGAarUOTGlugdWDL6or7wrzj5yXuHzSzbyOVx9A1JJmapgkmSerh58GDB1RUVESBgeLVftn9e/fu6XSMadOmUXBwsChACS1cuJD8/PwUF9ZMBgAA0jG1V0PaPaUblffUPLXdV8Na0LiudejP1zootrEmOTbEXN2Ei6o6PbMOxp88mWiRcXMpG0Be7hDKj+svGMGlT7NXp3qloY0J8lO9Lpsw+7D10MzFTek92nj2sX6zlzEWLVpEa9eupX/++Yd3llZlxowZlJ6errjcunXL4uUEAADr0/Z9zCZXZIuh1qjkLdretJofrR0boXIou/KXPOs/xDoYs7XF5CPIlEMH6wzM+uCw48o7VbMJH7URBplyHq70Qf8mpY+RE18UVpmHa+nX/IzeDWnb251FTWmmsu/d7qL7Mhuf5tCq4ScgIIBcXFwoOTlZtJ3dDwrSvKrvp59+ysPPjh07qHnz0oStzMPDg3x9fUUXAACQni8Gh/Pruf30G8JuqCUvhtHf4yNo8tMNRNuFsYDVNP01LoK+G95KZf8c0QgvwXbWgiefJkC+3/a3O9OPr4hHzbWsUVGwjxM1DFLfuZwFsC+HtCBfLTVkykuKnJnbk4IreFGT4NLvV9T8aODu7k6tWrUSdVaWd16OiIhQ+7xPPvmE5s+fT9u2baPWrVtbqLQAAGDPOterTFc+6k0jO+o3hN1QHq4u1KqmP7koTcoj7HvEHmsd6s8Diba8IByxxW4Lj8tusvXKuirVIKma5HH+s01VHv/rYS2pf1gwnZkbVeaxw9OfUvmcNrX8ydezpP/RX+M6KGrHbDz7WL/Ziw1zZ3P3/Pjjj3Tx4kUaP348ZWdn89FfzIgRI3jTldzHH39Ms2bN4qPB2NxArG8Qu2RlZVnxXQAAgD1wU+r0rA/WAVnZgPBgft26ZmkNizZTe4lrgnQlnuRQHGrqB+o+u3RYSAXq1aRs64qLhs7RrGZHsZ+zEx8px/osCSdz9HJ3oYk96tnFPD9WX9h00KBBlJKSQrNnz+Yhhg1hZzU68k7QiYmJfASY3IoVK/gosRdeeEF0HDZP0Ny5cy1efgAAkAa2Wvy0v87wWppdF5Pp44HN6Jnmwby56KmGVXQOHlX9dJv9meUHYSdn8VD3kus973SjlMw8qq1hdJkqPRpVoW3nxQOLdB0Y5vRkpJyq4UPyY9h49rF++GEmTJjAL+omNRS6ceOGhUoFAABQitVy/DK6Hb+dk1/IF19lnhcMR9dKj1TQOlRcmyQMQvKaHzZzs/LszS9H1KQfY25qPPbAltWpUjl38nF3pUHfHeHbhM1oIzuE0prDhnzflhzDxrOP9Zu9AAAA7I08+JjLtF4N6b0+4rl7nHT8xq5RSftSFs7OTvRUw0DREHlhxc+cfo1p37vdKEgwak2XGqLSmh/bjj82UfMDAAAApcZ3q6NxtJem1er1CR41/L15/x9fL1dyFfSHYh2qa1byoUPTn1LUCHWuF0AHrjygwW1qqD1ep7oBfBFVay5XoguEHwAAAAsxVX2IplXdW+nR+drJyYm+Gd5K7ePCprAVL7WiI9dSy0ywKOTj4ap3/yNrQPgBAACwEGNag4plui1y2qJGRfp9THsK8detY7Wu2MSKkY3FKzLYK4QfAAAAOyBsztJU88NE1Klk/gLZMXR4BgAAsFFsCQw5YT+acnrMwgxl4ewBAABYiKY1ryIbBdKKvddEK7a3r12Jvh3eiupU9uHhZ+MbHfkRzD3azNHh7AEAAFgIW/JCU0dltvCocDZlJkowGzObJBGMh2YvAAAAM/vsf2EUWsmbzwqtCVt4VL5WFpgPan4AAADMbGCr6vwCtgE1PwAAACApCD8AAAAgKQg/AAAAICkIPwAAACApCD8AAAAgKQg/AAAAICkIPwAAACApCD8AAAAgKQg/AAAAICkIPwAAACApCD8AAAAgKQg/AAAAICkIPwAAACApCD8AAAAgKa4kMTKZjF9nZGRYuygAAACgI/n3tvx73BiSCz+ZmZn8OiQkxNpFAQAAAAO+x/38/MgYTjJTRCg7UlxcTHfu3KHy5cuTk5OTyVMpC1W3bt0iX19fkiqchxI4D6VwLkrgPJTAeSiFc6H7eWBxhQWf4OBgcnY2rteO5Gp+2AmrXr26WV+D/eCk/CGWw3kogfNQCueiBM5DCZyHUjgXup0HY2t85NDhGQAAACQF4QcAAAAkBeHHhDw8PGjOnDn8WspwHkrgPJTCuSiB81AC56EUzoV1zoPkOjwDAACAtKHmBwAAACQF4QcAAAAkBeEHAAAAJAXhBwAAACQF4cdEli9fTqGhoeTp6Unt2rWj2NhYciRz587lM2ILLw0bNlQ8/vjxY3rjjTeoUqVKVK5cORo4cCAlJyeLjpGYmEh9+/Ylb29vqlKlCr377rtUWFhItmz//v3Ur18/PqMoe88bNmwQPc7GC8yePZuqVq1KXl5eFBkZSVeuXBHtk5aWRsOGDeMTd1WoUIFeffVVysrKEu1z5swZ6ty5M//8sFlOP/nkE7K3czFy5Mgyn5FevXo53LlYuHAhtWnThs8Szz7HAwYMoMuXL4v2MdXvw969e6lly5Z8BEzdunVpzZo1ZE/noVu3bmU+E+PGjXOo87BixQpq3ry5YnK+iIgI2rp1q6Q+C7qeC5v6PLDRXmCctWvXytzd3WWrVq2SnT9/XjZmzBhZhQoVZMnJyTJHMWfOHFmTJk1kd+/eVVxSUlIUj48bN04WEhIii46Olh0/flzWvn17WYcOHRSPFxYWypo2bSqLjIyUnTx5UrZlyxZZQECAbMaMGTJbxsr5/vvvy9avX89GRcr++ecf0eOLFi2S+fn5yTZs2CA7ffq0rH///rJatWrJcnNzFfv06tVLFhYWJjty5IjswIEDsrp168qGDBmieDw9PV0WGBgoGzZsmOzcuXOy33//Xebl5SX79ttvZfZ0Ll5++WX+XoWfkbS0NNE+jnAuoqKiZKtXr+blO3XqlKxPnz6yGjVqyLKyskz6+5CQkCDz9vaWTZ48WXbhwgXZsmXLZC4uLrJt27bJ7OU8dO3alf89FH4m2M/Ykc7Dv//+K9u8ebMsPj5edvnyZdl7770nc3Nz4+dFKp8FXc+FLX0eEH5MoG3btrI33nhDcb+oqEgWHBwsW7hwocyRwg/70lLl0aNH/AP+559/KrZdvHiRf0HGxMTw++xD7OzsLLt3755inxUrVsh8fX1leXl5Mnug/IVfXFwsCwoKki1evFh0Ljw8PPiXNsN+Odnzjh07pthn69atMicnJ1lSUhK///XXX8sqVqwoOg/Tpk2TNWjQQGar1IWfZ599Vu1zHPVc3L9/n7+vffv2mfT3YerUqfwfHEKDBg3iocMezoP8y27ixIlqn+OI54Fhn+GVK1dK9rOg6lzY2ucBzV5Gys/Pp7i4ON7cIVw/jN2PiYkhR8Kac1iTR+3atXnTBaueZNj7LygoEJ0D1iRWo0YNxTlg182aNaPAwEDFPlFRUXwxu/Pnz5M9un79Ot27d0/0vtm6M6zZU/i+WfNO69atFfuw/dln5OjRo4p9unTpQu7u7qJzw5oQHj58SPaEVUezquoGDRrQ+PHjKTU1VfGYo56L9PR0fu3v72/S3we2j/AY8n1s9e+K8nmQ+/XXXykgIICaNm1KM2bMoJycHMVjjnYeioqKaO3atZSdnc2bfKT6WVB1Lmzt8yC5hU1N7cGDB/yHLPxhMez+pUuXyFGwL3TWrsq+1O7evUsffPAB75dx7tw5HgDYlxX7YlM+B+wxhl2rOkfyx+yRvNyq3pfwfbMwIOTq6sq/IIT71KpVq8wx5I9VrFiR7AHr3/P888/z93Lt2jV67733qHfv3vyPkouLi0Oei+LiYnr77bepY8eO/I85Y6rfB3X7sC+C3Nxc3sfMls8DM3ToUKpZsyb/RxPryzVt2jQeZNevX+9Q5+Hs2bP8C57172H9ev755x9q3LgxnTp1SnKfhbNqzoWtfR4QfkAn7EtMjnVoY2GIfYj/+OMPm/rFA+sZPHiw4jb71xv7nNSpU4fXBvXo0YMcEevIyv4BcPDgQZIydedh7Nixos8EGxjAPgssHLPPhqNg/yhkQYfVfv3111/08ssv0759+0iKGqg5FywA2dLnAc1eRmLVd+xftcq999n9oKAgclTsXzL169enq1ev8vfJmv8ePXqk9hywa1XnSP6YPZKXW9PPnl3fv39f9DgbucBGPTnyuWFY8yj7/WCfEUc8FxMmTKD//vuP9uzZQ9WrV1dsN9Xvg7p92CgaW/oHh7rzoAr7RxMj/Ew4wnlgtTts1FGrVq34KLiwsDD64osvJPdZ0HQubO3zgPBjgh80+yFHR0eLqoDZfWE7p6Nhw5NZWmfJnb1/Nzc30TlgVZmsT5D8HLBrVh0q/PLbuXMn/8DKq0TtDWueYb+IwvfNql5Z/xXh+2Z/+Fjbv9zu3bv5Z0T+i8/2YcPIWd8A4blh/4KytWYefdy+fZv3+WGfEUc6F6y/N/vCZ9X5rPzKzXSm+n1g+wiPId/HVv6uaDsPqrAaAUb4mbD386AK+0zn5eVJ5rOgy7mwuc+DXt2jQe1QdzbCZ82aNXxEy9ixY/lQd2GPdXs3ZcoU2d69e2XXr1+XHTp0iA9FZEMQ2QgP+XBONsx19+7dfDhnREQEvygPYezZsycfFsuGJVauXNnmh7pnZmbyIZfswn5dlixZwm/fvHlTMdSd/aw3btwoO3PmDB/tpGqoe4sWLWRHjx6VHTx4UFavXj3R8G42IoQN7x4+fDgfEso+T2wopy0N79Z2Lthj77zzDh/Bwj4ju3btkrVs2ZK/18ePHzvUuRg/fjyf3oD9PgiH7Obk5Cj2McXvg3xI77vvvstHCC1fvtymhjdrOw9Xr16VzZs3j79/9plgvyO1a9eWdenSxaHOw/Tp0/kIN/Ye2d8Adp+NYNyxY4dkPgu6nAtb+zwg/JgIm2uAfcDZfD9s6Dubx8SRsKGEVatW5e+vWrVq/D77MMuxL/vXX3+dD2tkH8znnnuO/yEUunHjhqx379583hYWnFigKigokNmyPXv28C965Qsb1i0f7j5r1iz+hc0CcI8ePfj8FkKpqan8C75cuXJ8yOaoUaN4WBBicwR16tSJH4OdXxaq7OlcsC889geL/aFiQ3tr1qzJ5/NQ/geAI5wLVeeAXdicN6b+fWDnPDw8nP/esS8K4WvY+nlITEzkX2z+/v78Z8nmdGJfWMJ5XRzhPLzyyiv8887Kxj7/7G+APPhI5bOgy7mwtc+DE/uffnVFAAAAAPYLfX4AAABAUhB+AAAAQFIQfgAAAEBSEH4AAABAUhB+AAAAQFIQfgAAAEBSEH4AAABAUhB+AAAAQFIQfgDAbEJDQ2np0qU6789WgHdyciqzEKSj0vf8AIBpuJroOADgALp160bh4eEm+0I+duwY+fj46Lx/hw4d6O7du+Tn52eS1wcAUAXhBwD0wlbEKSoqIldX7X8+KleurNex3d3dKSgoyIjSAQBoh2YvAOBGjhxJ+/btoy+++II3PbHLjRs3FE1RW7dupVatWpGHhwcdPHiQrl27Rs8++ywFBgZSuXLlqE2bNrRr1y6NzTrsOCtXrqTnnnuOvL29qV69evTvv/+qbfZas2YNVahQgbZv306NGjXir9OrVy9eOyRXWFhIb731Ft+vUqVKNG3aNHr55ZdpwIABGt8vew+dO3cmLy8vCgkJ4cfIzs4WlX3+/Pk0ZMgQXntVrVo1Wr58uegYiYmJ/Bywcvn6+tKLL75IycnJon02bdrEz42npycFBATw9y6Uk5NDr7zyCpUvX55q1KhB3333nY4/MQAwFMIPAHAs9ERERNCYMWN4uGAXFgrkpk+fTosWLaKLFy9S8+bNKSsri/r06UPR0dF08uRJHkr69evHA4EmH3zwAQ8JZ86c4c8fNmwYpaWlqd2fhYNPP/2Ufv75Z9q/fz8//jvvvKN4/OOPP6Zff/2VVq9eTYcOHaKMjAzasGGDxjKw4MbKO3DgQF6OdevW8TA0YcIE0X6LFy+msLAw/v7Y+584cSLt3LmTP1ZcXMyDDys7C41se0JCAg0aNEjx/M2bN/Oww94nOwY7V23bthW9xmeffUatW7fmj7/++us0fvx4unz5ssbyA4CR9F4HHgAcVteuXWUTJ04UbduzZ4+M/anYsGGD1uc3adJEtmzZMsX9mjVryj7//HPFfXacmTNnKu5nZWXxbVu3bhW91sOHD/n91atX8/tXr15VPGf58uWywMBAxX12e/HixYr7hYWFsho1asieffZZteV89dVXZWPHjhVtO3DggMzZ2VmWm5urKHuvXr1E+wwaNEjWu3dvfnvHjh0yFxcXWWJiouLx8+fP8/LGxsby+xEREbJhw4apLQd7jZdeeklxv7i4WFalShXZihUr1D4HAIyHmh8A0AmrnRBiNT+sBoY1R7EmJ9b0w2qFtNX8sFojOdacxJqL7t+/r3Z/1jxWp04dxf2qVasq9k9PT+fNTMLaFBcXF948p8np06d5kxors/wSFRXFa3OuX7+u2I/VhAmx++w9Muya1YwJa8caN27Mz4V8n1OnTlGPHj10Ph+syY/1edJ0PgDAeOjwDAA6UR61xYIPa+phTVJ169blfWdeeOEFys/P13gcNzc30X32hc9Chz77l1QiGY4Ft9dee43381HG+t2YCjsn2uh7PgDAeKj5AQDRaCs2kksXrH8N6yTN+rQ0a9aM11iwDtKWxIbEsw7XbEi9HCv/iRMnND6vZcuWdOHCBR7alC/sHMgdOXJE9Dx2n9V0Mez61q1b/CLHjsk6a7MaIHmtDuvnAwC2BTU/ACAa4XT06FEeYlhTkL+/v9p92Uit9evX807OrLZi1qxZVqmxePPNN2nhwoU8uDRs2JCWLVtGDx8+5GVSh40Ia9++Pe/gPHr0aF6rxYILq8n66quvRAHvk08+4SPH2GN//vkn78TMREZG8tDHOmyzEW1s1BnrsNy1a1dFE+GcOXN4sxdrths8eDDfZ8uWLfz1AcB6UPMDAKKmLNZnhtVcsDl6NPXfWbJkCVWsWJFPTMgCEOszw2pULI0FCTYcfcSIEbxPjrz/Dhtarg6rkWEjtOLj4/lw9xYtWtDs2bMpODhYtN+UKVPo+PHj/PEPP/yQv2d2bIaFq40bN/Jz0KVLFx6GateuzUeOCSeNZIGJDednk0c+9dRTFBsba8azAQC6cGK9nnXaEwDADrDaJ9YkxYbTs3l6jKkFe/vtt/kFABwLmr0AwK7dvHmTduzYwZub8vLyeLMVG7E1dOhQaxcNAGwUmr0AwK45OzvzYetsFuWOHTvS2bNn+UzT8o7JAADK0OwFAAAAkoKaHwAAAJAUhB8AAACQFIQfAAAAkBSEHwAAAJAUhB8AAACQFIQfAAAAkBSEHwAAAJAUhB8AAAAgKfk/kskKiKadjNkAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "eval_loss = 0.2870\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "from torch import nn\n",
    "\n",
    "class LR(nn.Module):\n",
    "    def __init__(self, input_dim, output_dim):\n",
    "        super(LR, self).__init__()\n",
    "        self.linear = nn.Linear(input_dim, output_dim)\n",
    "        \n",
    "    def forward(self, input_feats, labels=None):\n",
    "        outputs = self.linear(input_feats)\n",
    "        \n",
    "        if labels is not None:\n",
    "            loss_fc = nn.CrossEntropyLoss()\n",
    "            loss = loss_fc(outputs, labels)\n",
    "            return (loss, outputs)\n",
    "        \n",
    "        return outputs\n",
    "\n",
    "model = LR(len(dataset.token2id), len(dataset.label2id))\n",
    "\n",
    "from torch.my_utils.data import Dataset, DataLoader\n",
    "from torch.optim import SGD, Adam\n",
    "\n",
    "# 使用PyTorch的DataLoader来进行数据循环，因此按照PyTorch的接口\n",
    "# 实现myDataset和DataCollator两个类\n",
    "# myDataset是对特征向量和标签的简单封装便于对齐接口，\n",
    "# DataCollator用于批量将数据转化为PyTorch支持的张量类型\n",
    "class myDataset(Dataset):\n",
    "    def __init__(self, X, Y):\n",
    "        self.X = X\n",
    "        self.Y = Y\n",
    "        \n",
    "    def __len__(self):\n",
    "        return len(self.X)\n",
    "\n",
    "    def __getitem__(self, idx):\n",
    "        return (self.X[idx], self.Y[idx])\n",
    "\n",
    "class DataCollator:\n",
    "    @classmethod\n",
    "    def collate_batch(cls, batch):\n",
    "        feats, labels = [], []\n",
    "        for x, y in batch:\n",
    "            feats.append(x)\n",
    "            labels.append(y)\n",
    "        # 直接将一个ndarray的列表转化为张量是非常慢的，\n",
    "        # 所以需要提前将列表转化为一整个ndarray\n",
    "        feats = torch.tensor(np.array(feats), dtype=torch.float)\n",
    "        labels = torch.tensor(np.array(labels), dtype=torch.long)\n",
    "        return {'input_feats': feats, 'labels': labels}\n",
    "\n",
    "# 设置训练超参数和优化器，模型初始化\n",
    "epochs = 50\n",
    "batch_size = 128\n",
    "learning_rate = 1e-3\n",
    "weight_decay = 0\n",
    "\n",
    "train_dataset = myDataset(train_F, train_Y)\n",
    "test_dataset = myDataset(test_F, test_Y)\n",
    "\n",
    "data_collator = DataCollator()\n",
    "train_dataloader = DataLoader(train_dataset, batch_size=batch_size,\\\n",
    "    shuffle=True, collate_fn=data_collator.collate_batch)\n",
    "test_dataloader = DataLoader(test_dataset, batch_size=batch_size,\\\n",
    "    shuffle=False, collate_fn=data_collator.collate_batch)\n",
    "optimizer = Adam(model.parameters(), lr=learning_rate,\\\n",
    "    weight_decay=weight_decay)\n",
    "model.zero_grad()\n",
    "model.train()\n",
    "\n",
    "from tqdm import tqdm, trange\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# 模型训练\n",
    "with trange(epochs, desc='epoch', ncols=60) as pbar:\n",
    "    epoch_loss = []\n",
    "    for epoch in pbar:\n",
    "        model.train()\n",
    "        for step, batch in enumerate(train_dataloader):\n",
    "            loss = model(**batch)[0]\n",
    "            pbar.set_description(f'epoch-{epoch}, loss={loss.item():.4f}')\n",
    "            loss.backward()\n",
    "            optimizer.step()\n",
    "            model.zero_grad()\n",
    "            epoch_loss.append(loss.item())\n",
    "\n",
    "    epoch_loss = np.array(epoch_loss)\n",
    "    # 打印损失曲线\n",
    "    plt.plot(range(len(epoch_loss)), epoch_loss)\n",
    "    plt.xlabel('training epoch')\n",
    "    plt.ylabel('loss')\n",
    "    plt.show()\n",
    "    \n",
    "    model.eval()\n",
    "    with torch.no_grad():\n",
    "        loss_terms = []\n",
    "        for batch in test_dataloader:\n",
    "            loss = model(**batch)[0]\n",
    "            loss_terms.append(loss.item())\n",
    "        print(f'eval_loss = {np.mean(loss_terms):.4f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10808854",
   "metadata": {},
   "source": [
    "下面的代码使用训练好的模型对测试集进行预测，并报告分类结果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "11a9bf62",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "test example-0, prediction = 0, label = 0\n",
      "test example-1, prediction = 0, label = 0\n",
      "test example-2, prediction = 1, label = 1\n",
      "test example-3, prediction = 1, label = 1\n",
      "test example-4, prediction = 1, label = 1\n"
     ]
    }
   ],
   "source": [
    "LR_preds = []\n",
    "model.eval()\n",
    "for batch in test_dataloader:\n",
    "    with torch.no_grad():\n",
    "        _, preds = model(**batch)\n",
    "        preds = np.argmax(preds, axis=1)\n",
    "        LR_preds.extend(preds)\n",
    "            \n",
    "for i, (p, y) in enumerate(zip(LR_preds, test_Y)):\n",
    "    if i >= 5:\n",
    "        break\n",
    "    print(f'test example-{i}, prediction = {p}, label = {y}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5feb65e",
   "metadata": {},
   "source": [
    "下面的代码展示多分类情况下宏平均和微平均的算法。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "a5ac32c5",
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NB: micro-f1 = 0.8961520630505331, macro-f1 = 0.8948572078813896\n",
      "LR: micro-f1 = 0.914696337505795, macro-f1 = 0.9140331854037965\n"
     ]
    }
   ],
   "source": [
    "test_Y = np.array(test_Y)\n",
    "NB_preds = np.array(NB_preds)\n",
    "LR_preds = np.array(LR_preds)\n",
    "\n",
    "def micro_f1(preds, labels):\n",
    "    TP = np.sum(preds == labels)\n",
    "    FN = FP = 0\n",
    "    for i in range(len(dataset.label2id)):\n",
    "        FN += np.sum((preds == i) & (labels != i))\n",
    "        FP += np.sum((preds != i) & (labels == i))\n",
    "    precision = TP / (TP + FP)\n",
    "    recall = TP / (TP + FN)\n",
    "    f1 = 2 * precision * recall / (precision + recall)\n",
    "    return f1\n",
    "\n",
    "def macro_f1(preds, labels):\n",
    "    f_scores = []\n",
    "    for i in range(len(dataset.label2id)):\n",
    "        TP = np.sum((preds == i) & (labels == i))\n",
    "        FN = np.sum((preds == i) & (labels != i))\n",
    "        FP = np.sum((preds != i) & (labels == i))\n",
    "        precision = TP / (TP + FP)\n",
    "        recall = TP / (TP + FN)\n",
    "        f1 = 2 * precision * recall / (precision + recall)\n",
    "        f_scores.append(f1)\n",
    "    return np.mean(f_scores)\n",
    "\n",
    "print(f'NB: micro-f1 = {micro_f1(NB_preds, test_Y)}, '+\\\n",
    "      f'macro-f1 = {macro_f1(NB_preds, test_Y)}')\n",
    "print(f'LR: micro-f1 = {micro_f1(LR_preds, test_Y)}, '+\\\n",
    "      f'macro-f1 = {macro_f1(LR_preds, test_Y)}')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
